Author Profiling, instance-based Similarity Classification

نویسندگان

  • Yaritza Adame Arcia
  • Daniel Castro-Castro
  • Reynier Ortega Bueno
  • Rafael Muñoz
چکیده

In digital documents analysis for forensic applications, when anonymous documents are presented and it is not possible with the available tools to determine the true author of the document, there are of vital importance methods that identify the characteristics of the Author Profile (Gender, Age, Personality, etc.). We propose to use a simple method of classification based on the similarity between objects, considering different features for documents representation: (a document corresponds to a set of tweets of a user), the terms used in the tweets, as well as characteristics of opinion and subjectivity presented in them. Our goal will be to classify, based on the content of the tweets, the Gender and language variety of an author from an unknown set of tweets corresponding to him. In the experiments we observed good results in Gender classification, but low values in language variety classification. We processed only the English dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

Grammar Checker Features for Author Identification and Author Profiling Notebook for PAN at CLEF 2013

Our work on author identification and author profiling is based on the question: Can the number and the types of grammatical errors serve as indicators for a specific author or a group of people? In order to detect the grammatical errors we base our approach on the output of the open-source library LanguageTool. In the case of the author identification we transform the problem into a statistica...

متن کامل

IRDDS: Instance reduction based on Distance-based decision surface

In instance-based learning, a training set is given to a classifier for classifying new instances. In practice, not all information in the training set is useful for classifiers. Therefore, it is convenient to discard irrelevant instances from the training set. This process is known as instance reduction, which is an important task for classifiers since through this process the time for classif...

متن کامل

Transition Potential Modeling of Land-Cover based on Similarity Weighted Instance-based Learning Procedure and Its Implication in the REDD Project Design Document

  Reducing Emissions from Deforestation and Forest Degradation (REDD) is a climate change mitigation strategy employed to reduce the intensity of deforestation and GHGS emissions. In recent decades, drastic land use changes in Mazandaran province caused a substantial reduction in the amount of Hyrcanian forests. The present research based on objectives of REDD projects paid to identify of fore...

متن کامل

Sub-Profiling by Linguistic Dimensions to Solve the Authorship Attribution Task

In this paper, we describe a modified version of the profile-based approach for the Authorship Attribution (AA) task of the PAN 2012 challenge. Our PAN system for AA utilizes the concept of linguistic modalities on profile-based (PB) approaches. We concatenate all the training documents from the same author and build author-specific sub-profiles, one per linguistic modality. Then instead of usi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017